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by 

T.  W.  Anderson 
Stanford  University 

and 

Akimichi  Takemura 
University  of  Tokyo 

ABSTRACT 

The  positive  probability  that  an  estimated  moving  average  process 
is  noninvertibl e  is  studied  for  maximum  likelihood  estimation  of  a 
univariate  process.  Upper  and  lower  bounds  for  the  probability  in  the 
first-order  case  are  obtained  as  well  as  limits  when  the  sample  size 
tends  to  infinity.  Higher  order  moving  average  models  and  autoregres¬ 
sive  moving  average  models  are  also  treated. 

Key  words:  moving  average  models,  maximum  likelihood  estimation,  non¬ 
invertibl  e  moving  average,  autoregressive  moving  average 
processes. 
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WHY  DO  NONINVERTIBLE  ESTIMATED  MOVING  AVERAGES  OCCUR? 


★ 

T.  W.  Anderson 
Stanford  University 


and 


Akimichi  Takemura 
University  of  Tokyo 


1.  INTRODUCTION 

In  maximum  likelihood  estimation  of  a  moving  average  process 
noninvertible  estimates  frequently  appear,  both  with  actual  data  and  in 
simulation  studies.  Kang  (1975)  showed  how  noninvertibil i ty  occurs  in 
the  moving  average  of  order  one  and  indicated  why  it  should  be  expected 
with  positive  probability,  Cryer  and  Ledolter  (1981)  and  Davidson 
(1981a),  (1981b)  have  investigated  the  probabilities  in  finite  samples. 
Sargan  and  Bhargava  (1983)  and  Pesaran  (1983)  have  found  the  limit  of 
the  probability  that  a  noninvertible  value  is  a  local  maximum  of  the 
likelihood  function  when  the  true  value  is  noninvertible.  We  develop, 

m 

organize,  and  generalize  these  results.  Some  new  theoretical  results 
include;  i)  a  rigorous  derivation  of  the  limiting  probabilities  that 
the  likelihood  function  attains  a  local  maximum  at  a  noninvertible  value 
for  noninvertible  (Theorem  4.1)  and  invertible  (Theorem  4.2)  processes, 

ii)  a  lower  bound  for  the  probability  that  the  likelihood  function 
attains  a  global  maximum  at  a  noninvertible  value  (Corollary  5.1), 

iii)  relations  between  maximum  likelihood  estimation  and  several  least 


*The  research  of  the  first  author  was  carried  out  in  part  as  Wesley 
Claire  Mitchell  Visiting  Professor  of  Economics  at  Columbia  University 
and  at  IBM  Systems  Research  Institute. 
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square  estimations  (Theorem  5.1).  Some  similar  results  are  obtained 
for  the  moving  average  process  of  general  order  (Section  6)  and  autore¬ 
gressive  and  moving  average  processes  of  general  order  (Section  8).  It 
will  be  shown  that  in  general  the  region  of  possible  autocovariances 
for  a  time  series  of  finite  length  T  is  larger  than  the  region  corre¬ 
sponding  to  moving  average  processes  (T  =  -=o)  .  This  results  in  a  positive 
probability  that  the  estimated  process  falls  on  the  boundary  of  the  region 
of  moving,  average  processes,  namely,  the  estimated  process  is  noninver- 
tible.  This  point  will  be  clarified  by  studying  the  Jacobian  matrix 
associated  with  the  transformation  from  moving  average  coefficients  to 
autocovariances  (Theorem  6.1,  Theorem  6.2,  and  Theorem  6.3)  and  by 
interpreting  the  results  from  a  geometric  viewpoint  (Theorem  6.4). 

The  above  general  consideration  is  illustrated  by  the  MA(2)  process 
(Section  7)  and  the  ARMA  (1,1)  process  (Section  9). 

2.  THE  MOVING  AVERAGE  PROCESS  OF  ORDER  ONE 

The  topic  we  are  studying  can  be  indicated  by  the  simplest  case, 
namely,  the  moving  average  of  order  one,  designated  as  MA(1)  .  Let 
{y^}  be  a  stochastic  process  defined  by 


^t  =  ''t  ^  “  Vl  ’  t  =  ...,-1,0,1,...  , 

where  {v.^.}  is  a  sequence  of  unobservable  random  variables  with  the 
properties 
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Then  {y^}  is  a  stochastic  process  stationary  in  the  wide  sense.  If  the 
v^'s  are  independent  and  identically  distributed,  {y^}  is  strictly  stationary. 

If  |a|  <  1  ,  we  can  invert  (1)  to  obtain  the  autoregressive  repre¬ 
sentation  (of  infinite  order) 

(3)  +  ...  . 

The  process  is  noninvertible  if  a  =  +1  .  Then  (3)  will  not  converge 
and  the  expression  is  meaningless. 

The  autoregressive  representation  is  important  because  it  is  used 
for  prediction.  The  prediction  of  y^  from  y^  y^  2»*** 

A  I  2 

-^t  "  ^  -^t'^t-r  •••  =  “  -  ct  y^_2  +  ...  , 

known  as  exponential  smoothing.  Another  reason  for  concern  about 

invertibil ity  is  that  iterative  computational  procedures  may  not  converge 
if  the  estimate  is  ±1  .  Moreover,  an  appeal  of  the  MA(1)  model  is  that 
it  approximates  an  autoregressive  model  with  coefficients  decreasing 
roughly  exponentially. 


The  first  and  second-order  moments  of  the  observable  process  {y^} 
can  be  obtained  from  (1)  and  (2).  The  mean  is  - 


(4) 


£  y^  =  0  . 
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The  autocovariance  sequence  is 


(5)  e  =  £(v^+av^_^)^=  a^d+a^)  =  a(0)  , 

£  y^y^_p,  =  0  =  a(h)  ,  h  =  2,3»...» 


and  a(-h)  =  CT(h)  .  The  first-order  autocorrelation  is 


(6) 


P  = 


a(l) 


2 

a  a 

V 


a 


/a(0)  /a(0)  cr^(l +a^)  1+a^ 


The  other  autocorrelations  are  0  .  If  a  is  replaced  by  its  recipro¬ 
cal 


(7) 


P  = 


l/g 


a 


l  +  (l/a)‘^  1+a' 


the  autocorrelation  is  unchanged.  We  can,  therefore,  restrict  a  to 
-1  <  a  <  1  without  loss  of  generality  as  far  as  first  and  second-order 
moments  are  concerned. 

We  shall  assume  {y^}  is  Gaussian,  that  is,  all  joint  distributions 
are  normal.  Then  the  moments  (4)  and  (5)  completely  describe  the  process. 
We  note  that  p  as  a  function  of  a  is  monotonically  increasing  in  the 
interval  [-1,1]  and  satisfies  the  inequality 


(8) 
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3.  MAXIMUM  LIKELIHOOD  ESTIMATION  FOR  THE  FIRST-ORDER  MOVING  AVERAGE 

The  observations  on  (y^l  at  T  successive  time  points  constitute 
a  vector 

(8)  y  ~  •  jy-j-) 

It  is  an  observation  from  a  normal  distribution  with  mean  0  and 
covariance  matrix 

(9)  T  =  a(0)  R  , 
where 


(10)  R  =  I^  +  2pA  , 

and 


Then  the  logarithm  of  the  likelihood  function  is 
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log  L  = 


-  ^  log  2it  -  j  log  a(0) 

-i’og|_R|  -^y'R-ly  . 


For  given  p  the  value  of  a(0}  that  maximizes  the  likelihood  function 
is 


(13) 


a(o) 


y'R’^y 
”  T  ~ 


Then  the  logarithm  of  the  concentrated  likelihood  function  is  (except 
for  constants) 

(14)  M(P)  =-log|R|  -  T  log  y'R"^y  . 

11"  y  ^  5  ’  maximum  of  M[p(a)]  with  respect  to  a  exists 
[Anderson  and  Mentz  (1980)].  The  derivative  equation  is 


(15) 


da  "  dp  da  (1+a^)^  ’ 


The  derivative  is  0  at  a  =  ±1  [Kang  (1975)].  The  question  is 
when  is  a  =  1  or  -1  a  maximum? 

If  y  ^0»  the  maximum  of  M(p)  exists  such  that  R  is  positive 
definite.  It  will  be  shown  that  R  is  positive  definite  for  -a<  p<a  , 
where  -^  <  a  <  1  .  If  dM(p)/dp  0  for  all  p  in  the  interval 

Y  y  Y  *  then  a  =  1  or  -1  yields  a  maximum.  The  maximum  can  be 


at  a  -  1  or  -1  if  dM(p)/dp=  0  for  some  values  of  P  in  the 
interval  [-4,41  . 
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A  local  maximum  occurs  at  a  =  1  if  and  only  if 


(16) 


0  > 


d^Mrp(a)] 

da^ 


_d_ 

dp 

da 

L_dp 

da 

a=l 

d^M 

dP  ^ 

2 

dp2 

[dcj 

a=l 

a=l 


da^ 


a=l 


2  dp 


2  2  1 

because  dp/da  =  0  at  a  =  1  and  d  p/da  =  -  j  a  =  l  . 
local  maximum  occurs  at  a  =  1  if  and  only  if 


Thus  a 


(17) 


dM 


>  0 


1 

P  2 


To  study  the  probabilities  of  maxima  it  is  convenient  to  put  the 
concentrated  likelihood  in  a  canonical  form.  Let  P  =  (p^-j).  where 

(18)  Pg^  =  sin  ,  s,t  =  l,...,T  . 

Then  P'AP  =  D  is  diagonal  and  the  diagonal  elements  are 

(19)  d^  =  cos  ^  ,  t  =  l,...,T 

[Anderson  (1971),  Sec.  6.5].  The  roots  can  be  visualized  by  dividing 
one-half  of  the  unit  circle  into  T+1  equal  parts  and  projecting  the 
points  on  the  circumference  on  the  diameter.  Then 


(20) 


P'RP  =  I.J.  +  2  pD  , 
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T 

(21)  IRI  =  n  (1+2  pd.)  , 

t=l  ^ 


(22) 


where 

(23) 


y'R’^y  = 


T 

E 

t=l 


1  +  2  pd^ 


9 


y  =  P  z  . 


Since  y  has  the  distribution  N[0,  a(0)R]  ,  then  z  has  the  distri¬ 
bution  N[0,  a(0)(Ij+2  pD)]  .  The  logarithm  of  the  concentrated 
likelihood  function  is  (except  for  constants) 

T 

M(p)  =  -  E  log(l  +  2  pd. )  -  T  log 
t=l  ^ 

For  R  to  be  positive  definite,  J.|.+  2  pD  must  be  positive  definite. 
Thus  1+2  pd^  >  0  ,  t=l,...,T  .  These  imply 


T 

E 

t=l 


1  +  2  pd, 


(25) 


1 


2  cos 


<  P  < 


T+1 


1 


2 


cos 


77 

T  +  1 


The  maximum  likelihood  estimator  of  p  is  a  solution  of  the 
derivative  of  the  logarithm  of  the  likelihood  function  set  equal  to  0  . 
The  derivative  is 


dM  _ 

" 


T 

E 

t=l 


1  +  2  pd^ 


T 

E 

t=l 


T 


1  +  2  Pd^ 


T 

E 

t=l 


(1  +  2  pd^) 


(26) 


+ 
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The  likelihood  equation  is 


(27) 


T 

T  S 


-  E 


^  E 


t=l  (1  +  2  pd^)' 


^  1  +  2  pd^  ^  1  +  2  pd^ 


=  0  . 


The  left-hand  side  is  a  polynomial  in  p  of  degree  2T  -  3  or  less, 

4.  A  LOCAL  MAXIMUM  AT  A  NONINVERTIBLE  VALUE 

The  probability  of  a  local  maximum  at  a  =  l  is 

(28) 


Pr 


m 

dp 


p4 


>  0 


T(T  +  2)d.+T-l  . 

=  Pr  >)E  - - — o -  >  0 


t=l  (1+d^)' 


We  have  made  use  of  the  fact  (proved  in  A1  in  the  appendix)  that 

T  d. 


(29) 


If  we  let 


t=a  ^  ^t 


T(T-l) 
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(30) 

=  /I  +  2  Pd 

then  x^  has  the  distribution  N(0,1)  , 

r  T 

(T  +  2)d.  +  (T-  1) 

(31)  Pr  J 

E 

c 

t=l 

v. 

(i  +  dt) 

t  "t 


(1  +  2  Pd^)x^  >  0 


As  a  simple  case  consider  T  =  2  .  The  concentrated  likelihood 
function  is  (except  for  a  constant) 


(32) 


1 


where 

(33) 


|m(p) 


(1  -P^)^ 


1  +p 


1 


2 

1 


+  Y  z; 
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Y  = 


/T^  ‘ 


Then  y  is  a  monotonically  increasing  function  of  p  .  A  small  table 
of  this  function  is 

Table  1 


p 

Y 

-1 

0 

1 

'  2 

1//3 

0 

1 

1 

2 

/3 

1 

CO 

The  likelihood  function  is  unimodal  (Figure  1).  The  unique  maximum  is 
at 


(34) 


0 


1 


2 


3 


4 


Figure  1.  Likelihood  function  for  MA(1),  T  =  2 

Thus  the  probability  of  the  ■’laximum  at  a  =  l  is 
(35)  PrCa  = 1}  =  Pr{9  >  /3} 


The  probability  (31)  has  been  evaluated  for  different  values  of  p 
(or  a)  by  several  numerical  techniques  by  Pesaran  (1983)  and  by  Cryer 
and  Ledolter  (1981).  The  Table  2  of  probabilities  of  local  maxima  was 


12 


given  by  the  latter.  Because  of  symmetry,  the  probability  that  a= 
is  a  local  maximum  for  a  process  coefficient  a  is  the  probability 
that  a  =  l  is  a  local  maximum  for  a  process  parameter  -a  . 

Table  2 

Probabilities  of  Local  Maxima  at  1  and  -1 


T  = 

2 

T  = 

10 

,T  = 

25 

a 

local 

max 

local 

max 

local 

max 

1 

-1 

1 

-1 

1 

-1 

0.0 

.333 

.333 

.102 

.102 

.015 

.015 

.2 

.389 

.282 

.145 

.076 

.025 

.011 

.4 

.440 

.244 

.223 

.062 

.050 

.009 

.6 

.476 

.220 

.353 

.056 

.119 

.008 

.8 

.495 

.208 

.533 

.053 

.319 

.008 

1.0 

.500 

.205 

.637 

.053 

.649 

.007 

Now  we  consider  large  sample  size.  First  we  find  the  limiting 
probability  of  a  relative  maximum  at  a  =  l  when  p  =  -^  (“=!)• 

Theorem  4.1: 


1  i  m  Pr 

T-^oo 


T 

S 

t=l 


(T+2)d^+T-l  2 

— "t 


>  0 


=  Pr{w^4^ 


=  .6575  , 


where  is  the  limiting  form  of  the  Cramer-von  Mises  statistic. 
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Proof:  For  p  =  ^  (31)  is 


(36) 


Pr 


T  X 


E  Xl  - 

t=l  ^ 


=  Pr 


■^^2  t=l 


T  X 


>  0 


1  2 
<  4  z  X 


t=l  T  t=l  ^ 


Let  Kj  be  a  sequence  of  integers  such  that  Ky  “  as  T  ^  , 

Kt/T  ->  0  ,  and  kS/T  .  Then  write  (36)  as 


(37)  Pr 


T  C  3  I'l  2  1  3 

^  t=l  LI  ’  T(T4-2)(l+d^)J^ 


i 


The  two  sides  of  the  inequality  in  (37)  are  independent.  For  k  =  0, 


(38) 


1  "  ^T-k 


1  +  cos 


TT(T-k) 

T+1 


=  1  -  cos 


■n-(k+l) 

T+1 


1 


2 


7r(k+l) 

T+1 


+  & 


1  Tr2(k+1)2 

^  (T+1)2 


+  © 
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Then  the  coefficient  of 


is 


(39) 


3 

THWO+d^ 


1  _  6(T+l)^  1  1 


^t’2) 


7(k+i7 


+ 


&(T“^) 


The  left-hand  side  of  the  inequality  in  (37)  has  the  limiting  distribu¬ 
tion  of 


(40) 


oo  X .  2 

6  E  =  6  , 

j=l  nj 


2 

where  VJ  is  the  limiting  form  of  the  Cram^r-von  Mises  statistic ,and 

I 

{x.}  are  independent  N(0,1)  variables. 

J 

Since  cose  is  a  decreasing  function  of  0(O<e<TT)  ,  for  t=l, . . . ,T-Kj-1 


(41) 


3 

T(T+2j(l+d^) 


< 


3 


r  7r(T-K,)1 

T(T+2) 

1  +  cos  - — 

The  right-hand  side  of  the  inequality  in  (37)  is 
T-K^-l  2  fTl  f  V' 


1  ? 

(42)  4  E  ‘  x;  +  % 

t=l  ^  P 


k2 


1 


T-K^-1  2 

T  jT-K^-1  ""t  ^  ‘^p 


which  has  1  as  a  probability  limit.  Hence  the  limit  of  (37)  is 
2 

Pr{W  <  1/6}  .  The  value  of  this  by  interpolation  in  the  table  of  the 
distribution  function  given  by  Anderson  and  Darling  (1952)  is  .6575.  Q.E.D. 
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Sargan  and  Bhargava  (1983)  have  obtained  this  result  by  a  different 
method,  Pesaran  (1983)  has  used  a  somewhat  similar  method. 

We  shall  now  show  that  if  the  MA(1)  process  is  invertible,  the 
probability  of  a  local  maximum  at  a  =  l  or  a  = -1  goes  to  0  at 
least  as  fast  as  for  any  n  . 


Theorem  4.2^  Let  ~  J  ^  P  ^  Y  fixed.  Then  for  any  n 


(43) 


1 im  T  Pr 
T-x» 


(t  (T+2)d  +  T-1 

1  T*  t 


t=l 


(1+d,) 


2 -  (l+2pd^)  >  0^  =  0 


Proof:  Let 


(44) 


(T+2)d^  +  T-1 


(Udp' 


T+2 
1  +d. 


Then  the  probability  on  the  left-hand  side  of  (43)  is 


(45) 


w^(l+2pd^)  x^  <  0)  . 


We  first  investigate  the  behavior  of  w^  .  Let 


(46) 


f(x)  = 


(l+x)‘ 


T+2 

1+x 


(T+2)x  +  T-1 
(l+x)^ 


X  >  -1 
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As  X  ^  -1  ,  f(x)  ^  .  We  have 


1 

>  0 

if 

X  < 

^  T+T 

(47) 

f(x)  ’ 

=  0 

if 

X  - 

^  T+Z 

<  0 

if 

X  >  - 

^  T+? 

(48)  f  (x)  =  -  +  J[±2 

(1+x)^  (1+x)^ 

=  _1  {(t+2)(1+x)-6} 

=  —1  {x(T+2)  +(T-4)}  . 

(1+x)'^ 


This  shows  that  f  is  decreasing  to  the  left  of  -(T-4)/{T+2)  and 
increasing  to  the  right  of  -(T-4)/(T+2) .  See  Figure  2. 


Figure  2. 
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Also  note  that  the  minimum  of  f  is 


(49) 


T-4] 

T+2j 


■(T+2)^  +  T  -  1 


1  - 


1-4) 


T-4 


T+2 
(T-1) 


2- 


(T+2)' 


-  (T+2)^ 

12 


Separating  positive  and  negative  weights  we  write  (45)  as 


(50) 


Pr 


Z  w.(l+2pd.)x^  <  Z 
w^>0  ^  ^  ^  w^<0  L 


-  w^(l+2pd^)x^ 


To  bound  this  probability  from  above  we  try  to  make  the  left-hand  side 
smaller  and  right-hand  side  larger.  Note  that  now  the  weights  are  all 
positive. 


Right-hand  side.  We  saw  that  >  - 


(Igll  .  Hence 


Also 


(51) 


l+2pd^  <  1+2|p1  <  2 


Hence  the  right-hand  side  is  bounded  from  above  by 


^  E  X? 


(52) 
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Left-hand  side,  is  positive  for  t=T,T-l,...  .  By  taking 
only  a  finite  number  k  of  terms  on  the  left-hand  side  we  decrease 
it;  that  is  , 

k-1 

(53)  E  w.(l+2pd  )xj  >  Z  w^  .  (l+2pdT  .)x?  .  . 

w^>0  ''  ^  ^  j=0  '"3  ‘"3  '-3 

(See  A2  in  the  Appendix  for  verification  that  the  left-hand  side  of 
(53)  contains  at  least  k  terms  for  large  T  .) 

Now  we  look  at  the  weights 

’  3=0 . k-1  . 

w^_^.  is  decreasing  in  j  ;  hence 

Vj-^-k+l’  3=0,..., k-1  . 

l+2pd.^_j  >  1  -  2|p|  . 

E  w^.j(l+2pd  )x2  i  (1.2|p|)  V  x2  . 

3-u  j=0  '  J 


(55) 

Also 

(56) 

Hence 

(57) 
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Combining  these  results  we  find  that  (50)  is  less  than  or  equal  to 

(58)  P’"  -  "^^3^  4l}  ’ 

where  Xj  3nd  Xjj  9'^®  independent  chi-square  random  variables  with 
k  and  T'  (=  number  of  > 0)  degrees  of  freedom.  Hence,  (58)  is 


(59) 


E  2 


— 

2  k  , 

X 

2  . 

"l 

X  e 

dx 

0 

-  ^xfi 


0 


"^2  ^2 


k' 

(Cj  Xij)" 


where  c^  =  [2*^'^^  r(k/2) ,  C2  =  2Cj/k,  and 


(60) 


-  (T+2)' 


(1'2|p1  )'''T-k+l 


Then  for  k  even  the  right-hand  side  of  (59)  is 
(61)  C2  T'(T'+2)  ...  (T'+k-2)  . 

k/2 

There  are  k/2  terms  in  the  product,  which  is  '^T  '  )  because 
T'  =  ^(T)  .  However 


(62) 


3 


'^T-k+1 


Hence  =  &(T"^)  ,  and 
is  less  than  or  equal  to 


T+2 


1+d 


T-k+1 


=  ^(r)  . 


=&(T~*^)  .  Combining  these  we  have  (50) 
&(T  .  This  completes  the  proof.  Q.E.D. 


5.  LEAST  SQUARES  ESTIMATION  FOR  THE  FIRST-ORDER  MOVING  AVERAGE 

We  now  consider  two  kinds  of  least  squares  estimators  and  investi¬ 
gate  the  relation  between  these  and  the  maximum  likelihood  estimator. 

As  a  corollary  we  shall  obtain  a  lower  bound  for  the  probability  that 
the  likelihood  function  attains  a  global  maximum  at  a  =  l  .  There  are 
two  ways  of  parametrizing  the  process.  The  parametrization  (ct(0),p) 

has  been  used  above.  Another  parametrization  is  where 

2  2  2 
Oy  =  £v^  is  the  variance  of  the  disturbance  term.  Let  Q  =  (l+a  )Iy  + 

2aA.  Then  the  logarithm  of  the  likelihood  function  is 

(63)  log  L  =  -  ^  log  2Tr  -  ^  log 

-  4  1og|Q|  -  y’g'V  . 

20^-- 

2 

Maximizing  with  respect  to  we  obtain  the  concentrated  likelihood 
function 

Mjj(a)  =-log|Qi  -  T  log  y'Q“V  . 


(64) 


21 


Ignoring  the  determinant  term, consider  two  quadratic  forms 
(65.)  Sj(p)=y'R"ly  , 


(66) 


Sjj(a)  =  y'Q“V  . 


Let  I  ^Ls  II  estimators  which  minimize  Sj[p(a)]  , 

and  Sjj(a)  respectively: 

(67)  min  ST[p(a)]  =min  Sy[a/(l+a^)]  at  a.  ~  ,  , 

Qj  j.  a  1-0,1 


min  SttCci)  at  a 
a  ^  ^ 


LS,II  • 


Furthermore,! et  denote  the  maximum  likelihood  estimator. 


Theorem  5.1, 


(68) 


I°‘LS,I'  -  -  I“LS,II'  ’ 


and 


(69) 


\s,i  "  ^  “ml  ^  “LS,II  "  ^ 


Proof.  Consider  log|R|  =  log(l  +  2pd^)  =  log(l-4p^d 

Clearly  loglR]  is  strictly  decreasing  in  p'^  .  Let  P|_5j  ~  “lS,I^ 

(l+a^g  j)  .  Then  Sj(pls j)  ^  Sj(p)  for  all  p.  Hence  for  all  IpHIpls,! 


Cvj  +-> 
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we  have  M(p)  -  M(p,  -  ,)  =  (loglRl  -  -  logi R| )  -  T[log  Sx(p)  - 

log  Sj(p|_s_  j)]  <  0  .  This  implies  that  prove 

the  second  inequality,  consider  log|Ql  .  Since  lQ|  =l+a^  +  ...  +  a^''', 

log  |Q|  is  strictly  increasing  in  .  Hence  a  similar  argument  as 

above  yields  .  This  proves  (68).  (69)  follows  from 

(68)  by  virtue  of  the  fact  0  =  Pr[  Sj(|-)  =  Sj(-  i)  ]  =  Pr[M(^)  =  M(-  j)] 

=  Pr[Sjj(l)  =  Sjj(-l)]  . 


Corollary  5,1. 

(70)  =  >  Pr(aj_2j  =  l) 


Pr 


E 

t 


2 

4 "  ° 


Proof:  The  first  inequality  is  an  immediate  consequence  of  (69).  Now 
the  event  =  1  is  equivalent  to  Sj(p)  >  Sj(|-)  ,  which  is 

.2  ,2 

I  1 

2’  2_  ’ 

or 


(71) 


I  l+2pd. 


>  E 


.  1+d. 


V  p 


(72) 


E 

t 


(l+2pd^)(l+d^) 


V  p 


€  “•  n 


t 

2 


The  event  is 


min  E 
P  t 


(l+2pd^)(l+d^)  n 


>  0  . 


(73) 
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Since  each  coefficient  in  (73)  is  decreasing  in  p  ,  the  event  is 


(74)  E  - >  0  .  Q.E.D. 

t  (1+d^)^ 


When  T  =  2  ,  the  lower  bound  is 


(75) 

Pr  • 

i  .9 

■ 

p 

■  =  Pr  ■ 

[^2 

Jl  >  9  lr£.  I 
,2  -M+p  ^ 


-P 


2  /T 

=  1 - arc  tan  3  /  Tx" 

T7  /  i+P 

f.204,  p=0  . 


1 
3  ’ 


1 

p-  Y  • 


The  lower  bounds  of  .204  and  .333  are  to  be  compared  with  the  exact 
values  of  .333  and  .5,  respectively.  Analysis  similar  to  that  applied 
to  the  upper  bound  (i.e.  local  maximum)  shows  that  for  p  =  -^  as  T->“ 
the  lower  bound  approaches  0. 
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Theorem  5.2.  j  is  biased  toward  origin.  and  jj 

are  consistent. 

It  will  be  shown  that  if  a  =  l  ,  then  ctj^g  j  converges  to  .829 
in  probability.  Because  of  this  bias  Pr(a|^2  j  =l|ct=l)  goes  to  zero 
and  this  fact  indicates  that  this  lower  bound  is  not  sharp. 

To  prove  the  theorem  we  need  the  following  lemma. 

Lemma  1.  Let  Ia|<l,  |b|<l.  Then 


(76) 


J(a;b)  =  - 


1  +  a  cos  9 
1  +  b  cos  0 


de 


b^ 


1 

/l-b^ 


Pro^.  Differentiating  the  relation 

fir  _ 

(77)  Jq  log(a+b  cos  0)d0  =  tt  log  (a  +/a^-b^) ,  a>b>0 

[Anderson  (1971),  Problem  69  of  Chapter  6]  with  respect  to  a  and 
setting  a  =  l  we  obtain 


(78) 


rTT 

•  0 


1 

1 +  b  cos  0 


do 


But  (1+a  cos  0)/(l+b  cos  0)  =  a/b  +  (l-a/b)/(l  +  b  cos  0)  .  The 

lemma  follows.  Q.E.D. 


25 


Let  p*  be  the  true  autocorrelation  coefficient.  Then 


(79) 


Sj(p)  = 


T  l+2p*d^  2 

t=l  ’ 


where  the  x^'s  are  independent  standard  normal  variables.  We 
consider  Sj(p)  in  the  open  interval  -  <  p  <  j  •  P  equal 

to  -  j  ’  Let  p  be  fixed  for  the  moment.  Then  as  T -><=° 


(SO) 


E„.t|Sj(p)] 


j  T  l+2p*d^ 

T 


^  l+2p*  cos(tts) 
g  l+2p  cos(Trs) 


(•TT 

-  1  l+2p*  cos  9  . 

"  IT  l+2p  cos  0 

JO 


=  J(2p*;2p)  . 


Furthermore 

(81) 


Var[^Sj(p)]  = 


1  ‘ 

-At  Z  2 

T^  t=l 


fl+2p*  dj 


l+2pd. 


2 

T 


rl 


^2 


l+2p*  cos(Trs) 
l+2p  cos(7rs) 


ds 


^  0 

as  T-voo.  Hence  for  a  fixed  p  ,  .1  Sj(p)  converges  to  J(2p*;2p)  in 
probability.  Now  note  that 
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(82) 


— 2"  ~  7 

3b^ 


f-rr 


cos^  e(l+a  cos  6)  >  Q 


0  (1+b  cos  9)' 


Hence  for  given  a  ,  J(a;b)  is  convex  in  b  and  has  a  unique  minimum, 
b(a),  say.  By  a  standard  argument  then  we  see  that  the  i  which 
minimizes  Sj(p)  converges  in  probability  to  '^(2p*)  •  An  explicit 
expression  for  the  minimum  can  be  obtained  by  solving  ^  J(a;b)  =  0  . 
Now 


(83) 


3b  ■  ■  A  *  A 


b  -  a 


Solving  this  for  a  we  obtain 


(84) 


a  = 


+  2b"-l  ■ 


A  plot  of  this  relation  in  terms  of  a/2  =  p*  and  b/2  =  plim  p^^  jIp 
is  given  in  Figure  3, 


Figure  3. 


This  shows  that  j  is  heavily  downward  biased.  Ii 


,*  = 


then  plim  ^  *354  (or  in  terms  of  a, 


plim  I  “  .829)  . 


As  to  a,  ^  tt  note  that  S.-(p)  =  ST(p(a))/(l+a  ) 


particular  if 
if  a  =  1  then 

or 
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(86) 


Jll(a;b) 


J{a;b)  =  ^[1+ 


1-ab 


Then 


(87) 


3^b 


b-a 

(l-b2)3/2  • 


Note  that  ^Jjj(a;b)<0  if  b<a  and  ^Jjj(a;b)>0  if  b>a. 
Hence  for  given  a  Jjj(a;b)  attains  a  unique  minimum  at  b  =  a  .  By  a 
standard  argument  again,  then  jj  converges  to  p*  in  probability 
showing  that  p^^  n  "'s  consistent  (and  so  is  jj). 

As  to  the  maximum  likelihood  estimator  consider  1/T  times  the 
concentrated  likelihood  function  in  terms  of  a  : 


Y  Mjj(a)  =-^  logiQl  -  log- y'Q“^y 
=  -j  loglQl  -  log  Sjj(a)  . 

+  a^ +  ... +a^'^  ,  hence  1/T  log| Q|  <  1/T  log(T+l) 0  as 

T .  Therefore  the  determinant  term  is  asymptotically  negligible. 

Therefore  consistency  of  jj  implies  consistency  of  .  This 

proves  the  theorem.  Q.E.D. 

Note  that  log|Rl  =.  -  T  log(l+a^)  +  loglQl  .  Hence  logjRl/T^ 

2 

-log(l+a  )  ,  which  is  not  asymptotically  negligible.  This  fact  explains 
the  inconsistency  of  j  • 

As  Theorem  5.1  shov/s,  jj  more  likely  to  assume  the 
noninvertible  value  ±1  than  the  maximum  likelihood  estimator. 


(83) 


Now  I Q I  = 
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This  can  be  also  seen  by  considering  the  behavior  of  Sjj(a)  at  a-±l  . 
Note  that 


(89) 


T  a+d . 

S,  '(a)  =  -2  Z  - ^ - 2 

t=l(l+a^+2ad. ) 


Hence 


(90) 


Sji  '(1) 


<  0 


Similarly  Sjj'(-1)>0.  This  shows  that  a  =  ±l  is  always  local  mini¬ 
mum  of  the  quadratic  form  Sjj(a)  .  Ignoring  the  determinant  has  a 
disadvantageous  effect  on  estimation. 


6.  THE  MOVING  AVERAGE  OF  GENERAL  ORDER 

The  results  for  the  general  order  of  moving  average  are  not  as 


clearcut.  We 

write  the  process  as 

(91) 

'  ’^0"t  ^  Vt-q  ’ 

where 

(92) 

2 

Yq  >  0  »  =  0  ,  £w^  =  1  , 

and  the  w^-'s 

are  uncorrelated.  Then  the  autocovariance  sequence  is 

(93) 

a(h)  =  ^_Z^  YjYj+h  ’  h=0,l,...,q  , 

=  0  ,  h  =  q+1,. . .  , 
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and 

a(-h)  = 

a(h)  . 

The  covariance 

matrix  of  T 

successive 

terms 

the 

process 

is  the 

TxT  matrix 

a(0) 

cr(l) 

o'(q) 

0 

0 

a(l) 

• 

cr(0) 

• 

a(q-l) 

• 

a(q) 

• 

0 

(94) 

fT  = 

• 

• 

CT(q) 

• 

cr(q-l) 

• 

• 

a(0) 

•  ' 

• 

a(l) 

• 

• 

0 

0 

• 

o{q) 

• 

a(l) 

a(0) 

• 

0 

• 

• 

• 

_  0 

• 

• 

0 

0 

• 

• 

0 

• 

• 

a(0) 

We  define 

(95) 

"q. 

y  ~  (cf  ( 0 ) , . , ,  ,0 ( q )  ;  Ey 

is  positive 

definite}  , 

(96) 

^q,co 

=  {a(0) 

,...,o'(q)  I  E j  1 

is  positive  definite  VT' 

>  q+1} 

A  vector  a  -  [a(0) , . . .  ,a(q)  ] '  e  Sq  <,0  if  and  only  if  there  exist  real 
Yo»***>Yq  such  that  (93)  is  satisfied.  Alternatively,  cr  e  gq  ^  if 
and  only  if 

(97)  27Tf(A)  =  S  a(A)e''^^  >  0  ,  VX  . 

h=-q 


Cl  early 


^q,q+l  ^^q,q+2  ^•••^^q,oo  • 


31 


An  alternative  set  of  parameters  consists  of  Yq  62»...,0q> 

the  roots  of 


(98) 


Yq 


Ti 


There  is  no  loss  in  generality  in  requiring  |0^. |  <1  ,  i  =l,...,q  . 
[See  Section  7.5.2  of  Anderson  (1971).]  A  moving  average  process  is 
invertible  if  10^.1<1,  i=l,...,q.  It  is  noninvertible  if  |0^-|=1 
for  at  least  one  value  of  i  . 

The  derivatives  of  the  loglikelihood  function  with  respect  to 


^0 


s • . . >Yq  are 


(99) 


9loq  L 
3Yo 


•  •  • 


0loq  L 


aioq  L  9log  L 

3aW”**’  3a(q)  , 


9cr(0) 

9a(0) 

• 

•••  9Yq 

• 

• 

• 

3a(q) 

• 

9a(q) 

9Yo 

Let  a  =  (a(0) , . . . ,a(q)) '  satisfy 


(100) 


0  = 


91  og  L 
9a(o) 


9loq  L  ' 
9a{q)  ^ 


If  there  is  no 


then  we  must  have  1 J |  =0  for  y  =  Y  > 


where 
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(101) 


is  the  Jacobian  matrix  and  y  =  (Yo.--.>Yq)  satisfies  (99)  set  equal 
to  0.  We  shall  show  that  in  this  case  y  =  (YQ»«»«»Yq)  corresponds 
to  a  noninvertible  process. 

Theorem  6.1.  Let  0p...,9^  denote  the  roots  of  (98).  Then 

(102)  |J|  =  2y2+i  n  (1-6,)  S  (1+6,)  n  (1-6,0,)  . 

We  see  that  |J|  vanishes  if  and  only  if  the  characteristic  poly¬ 
nomial  has  at  least  one  root  of  absolute  value  1.  Hence  we  have 

Corollary  6.1.  The  Jacobian  |Jl  vanishes  at  y  =  (Yn»*-*»Y  )  if 
and  only  if  the  corresponding  process  (91)  is  noninvertible. 

Proof  of  Theorem  6.1.  The  Jacobian  is 


(103)  1J|= 


3a(0) 

3a(0) 

2^0 

2^1 

•  •  •  2y- 

3Yi 

q 

3a(l) 

3a(l) 

•••  Vi 

• 

• 

3Yi 

^2 

• 

^3 

• 

i 

• 

• 

Yq_2 

• 

3cr(q) 

3Yo 

3a(q) 

3y^ 

• 

^q-1 

^q 

• 

• 

...  Yi 

"q 

0 

Yq 
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Yq  Yi  •••  Yq 

Yi  Y0+Y2  •••  Yq.i 

•  •  '  • 

•  •  • 

•  •  • 

^-1  Yq  •••  Yi 

Yq  0  •••  Yq 


=  2y‘ 


,q+i 


a. 


a 


q-1 


a. 


a 


1 

1+a, 


a 


a. 


a 


a 


q-1 

» 

• 

1 

1 


where  a^.  =  Y-j/Yg  >  i  =0,l,...,q  .  We  can  write 


(104)  1  +a^x  +  , . .  +  =  (1  -9^x)  ...  (1  -  e^x)  ; 

that  is,  a,  =-  Z0.  ,  =  Z  0.9.  ,  etc.  Then  the  last  determinant 

i  1  J 

becomes  a  polynomial  in  the  9-|'s  when  the  a^. 's  are  expressed  in 
terms  of  9^. 's  . 

Now  we  multiply  row  i  by  9^  +  0^^  and  add  to  row  0  , 
i  =  l,...,q  .  Then  row  0  becomes 


(105) 


I 


q 

E 

i=0 


a^.  9 


i 

1 


+ 


-i 

S  a.9,  , 

i=0  ^  ^ 


q  •  q 

E  a.05  +  9.  E  a.07 
i=0  ^  ^  ^  i=0  ^  ^ 


5 


-Q  ^  i  a  ^ 

9,^  E  a.0  +  07  E  a.g, 

i  i=0  ^  ^  ^  i=0 


•  •  «  , 
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This  follows  from  the  evaluation  of  the  j-th  column: 

(106)  aj(0j+eJ°)  +(“j_i+aj+l)(9}+9l^)  +  (^^.2  +  aj+2)  +  ••• 

=  Z  +6'^“'')  =  e'j  z  a.e]  +  0^’  Z  a. 07^  . 

i=0  ^  ^  1  1  i=0  ^  1  1  i=0  ^  1 


•  M 

However  *^-j92  ■  9^*^  *^i®l  ~  ®  because  0^  is  a  root  of 

characteristic  polynomial.  Hence,  the  first  row  becomes 


the 


(107) 


Z  a . 


i=0 


i"l 


-1  ^  i 

9/  E  a.0J 

^  i=0  ^  ^ 


>..,07^  E  a.0] 
^  i=0  ^  ^ 


Hence 


(108) 


,^0  “i®i  '  (i-9i)(i-9ie2)  ...  (I-Bje^) 


is  a  factor  of  the  determinant. 

The  above  operation  can  be  done  with  the  other  0^. 's.  Hence  we 
see  that  ^i<j(  1  "  9^.9j)  is  a  factor  of  the  determinant 


(109) 


a 


a. 


ct. 


1 

1+a, 


a. 


a 


q-1 


(-i)%e. 


-E0. 

1 


-E9. 


(-l)%e: 


Now  consider  the  degree  of  this  polynomial  in  the  0^. 's 
term  comes  from 


The  highest 


±  =  ±  n  09'''^ 


(110) 
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The  highest  degree  term  in  9^9j)  is  ±  9^.6^  =  ±  ILj  0^  ^  . 

These  agree.  Hence  |Jl  is  crf^Tl.^.{l-e.e.)  for  some  constant  c. 
Considering  the  constant  term  we  obtain  c  = 2.  Hence 

(111)  |J1=  n_{l-0.9j.) 

1  —  J 

=  n(i -0.)  nd +0.)  n  (1-0.0.)  .  q.e.d. 

U  I  I  I  J 

Now  let  us  find  the  null  vector  of  the  Jacobian  matrix  J  when  it 
is  singular.  We  have  shown  that  jj]  =0  if  and  only  if  (Yo»-*-»Yq) 
corresponds  to  a  noninvertible  process.  The  proof  gives  an  explicit 
expression  of  a  null  vector  n  such  that 

(112)  9'  -i  "  2'  * 

Suppose  that  y  =  (Yo»---»Yq)'  corresponds  to  a  noninvertible  process. 
Then  there  exists  a  frequency  v(0<v<Tr)  such  that  the  spectral 
density  is  zero  at  v  ;  that  is 

ivi  ^ 

(113)  2TTf(v)  =  Z  Yi  Q 

j=0  'J 

=  ct(0)  +  20(1)  cos  V  +  ...  +  20(q)  cos  vq 
=  0  . 

It  follows  that  Z^^Q  Yj  =  0  ;  and  Z^^q  Yj  =0  .  Let 

0^=e''^  in  the  proof.  Then  0^  +  0^'^  =  2  cos  jv  .  Let 
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n'  =  (1.  02  +02^ . ej  + 

=  (1,  2  cos  V,...,  2  cos  qv)  . 
Then  the  above  proof  shows  that 


(115)  (2, 


=  (1, 

=  n'J 

Therefore  we  have  proved: 

Theorem  6.2,  Let  Yg>«**>Yq  correspond  to  a  noninvertible  process. 
Let  V  be  such  that  the  spectraV density  f(v)=0.  Then 
n'  =(1,  2  cos  v,...,2  cos  qv)  is  a  null  vector  (from  the  left)  of  the 
Jacobian  matrix  J  . 

This  theorem  can  be  given  an  alternative  proof  as  follows. 

Let 

p  p 

(116)  27rf(v;  aQ,...,aq)  =  (aQ+  ...  +a‘^)  +2(aQa2  +  ...  +aq_23q)  cos  v 

+  . . .  +2  a^a  cos(vq)  . 

0  q  '  -I' 


2  cos  V,  . . .  ,  2  cos  qv) 


YqjYis  •..Y 


Y- 


Y, 


q 

^q-l 

Yn 


[  ^0’2T2.-  • .  .2Yg 


2  COS  V,  ...  ,  2  cos  qv) 


Y, 


X 


q-1 


yq 


=  O'  . 


37 


Then  f (v;  Bq, . . . .a^)  > 0  for  all  (real)  aQ,...,aq.  By  assumption 
f(v;  Yo»--->Yq)  =0  •  •-et  =  Yp* =  Yq  be  fixed  and  consider 
f(v;  aQ,  Y3^.--->Yq)  as  a  function  of  ap  .  It  attains  a  minimum  (=0) 
at  aQ=Yg  .  Hence 

(117)  (v;  Yo>Yi»*  •  •  >Yq)  =  0  • 

This  gives 


(118) 


(1,  2  cos  V 


,2  cos  qv) 


^1 


=  0  . 


Similarly  considering  Yp'-^jYq  sequence  we  obtain 


(119)  (1,  2  cos  V,...,  2  cos  qv)  J=0'  . 

This  completes  the  alternative  proof. 

Now  recall  the  likelihood  equation: 


(120) 


8 log  L 
^  9a (Oy 


9  •  •  •  9 


9 log  L 
9a(q) 


0  =  0' 


If  we  assume  that  the  rank  of  0  is  q  ,  then  the  null  vector  is 
unique  up  to  a  multiplicative  constant.  Hence  we  have: 
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Theorem  6.3.  Suppose  that  the  likelihood  equation 


fsloq  L 


3Yo 


3  log  L] 


=  (0,...,0)  has  a  noninvertible  solution  Yq . Yq  and  the  rank  of 

J  is  q  .  Then  there  exists  a  unique  v>0  such  that  the  spectral 
density  is  zero  at  v  and 


(121) 


3 log  L 
9a(g) 


=  c{l,  2  cos  V 


,2  cos  qv) 


for  some  c  o 


The  rank  condition  of  this  theorem  is  a  natural  one  because  it  corre¬ 
sponds  to  smoothness  of  the  boundary  of  the  invertibil ity  region.  This 
can  be  illustrated  by  considering  the  MA(2)  case.  In  terms  of 
(pj,p2)  the  region  has  a  smooth  boundary  except  for  the  point  (ppp2) 

~  (Oj  “  •^)  •  See  Figure  5  of  the  next  section.  Consider  the  Jacobian 


matrix 


(122) 


^^1  ‘^'^2 
Yi  Yo-hr2  Yi  . 

J 


J  is  of  rank  1  if  and  only  if  row  1  is  proportional  to  row  3  and  row  2 
is  proportional  to  row  3.  But  this  implies  =  0,  Yq+Y2  =  0  ,  or 
Yi  =  0  ,  Y2  =  -Yq  •  (Conversely  if  =  0,  Y2  =  -Yq,  then  J  is  of 
rank  1.)  Hence  this  case  corresponds  to  (pj^,P2)  =  (0,  -  i)  .  For 
other  boundary  points  the  rank  of  J  =  2  . 
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Geometric  interpretation. 

The  above  results  can  be  interpreted  from  a  geometric  viewpoint. 

Consider  the  set  s  again.  This  set  is  convex.  This  follows  from 

,00  a  - 

the  fact  that  (a(0),...,a(q))  e  ^  if  and  only  if  a(0)  + 20(1)  cos  X  + 

q  ,00 

. ..  +  20(q)  cos(qX)  >  0  for  all  X  .  Let  0^®^  =  (0^, . . .  ,0^^)  =  (o^Cy)  > ••  •  > 
0^(y))  be  a  boundary  point  of  ^  and  let  P  be  a  supporting  hyper¬ 
plane  at  0^^^  : 

(123)  (0(0) -a°)  Cg  +  ...  +  (0(q) -0^)  Cq  <  0 

for  all  (0(0) ,. . . ,a(q) )  e gq  ^  •  See  Figure  4.  £ '  =  (Cq» • • • .c^)  is  the 

P 

c'  =  (cqj . . . ,Cq) 


Figure  4. 
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normal  vector  to  the  supporting  hyperplane.  Actually  a°Cn+...+a\ 

U  q 

=  0  because  is  a  cone.  This  can  be  verified  by  letting  (a(0), 

...,a(q))  =  t(a®,...,a‘^)  and  considering  (123)  for  t>l  and  0  <t<  1  . 
Let  be  fixed  and  let  Yq  be  changed  by  a  small  amount 

Ayq  .  Then  a(0)  -  Ayq,.  . .  ,a(q)  -  ayq  .  Substi¬ 
tuting  this  into  (123)  we  have  Ayq  3a(i)/9Yo  *  <  0  .  Ayq  can 

be  positive  or  negative,  hence  3a(i)/9Yo  *  c.  =  0  .  Namely  the 

infinitesimal  displacement  of  (cr(0),...,a(q))  lies  in  the  supporting 
hyperplane.  This  consideration  can  be  applied  to  Y2.>***>Yq  as  well. 
■In  matrix  form  we  then  have 

(124)  c'J  =  O'  . 

Namely  the  normal  vector  to  the  supporting  hyperplane  is  a  null  vector. 
Hence  under  the  same  assumptions  of  the  last  theorem  we  have 

Theorem  6.4.  Let  the  assumptions  of  Theorem  6.3  hold.  Then  there 
exists  a  unique  supporting  hyperplane  at  the  boundary  point  and 

the  gradient  of  log  L  with  respect  to  a(0) . o(q)  is  proportional 

to  the  normal  vector  of  the  hyperplane. 

This  theorem  implies  that  if  the  boundary  point  a  corresponds  to 
a  relative  maximum  of  the  likelihood  function,  then  the  likelihood  in 
terms  of  (a(0),...,a(q))  increases  most  steeply  in  the  direction 
orthogonal  to  g 
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7.  THE  MOVING  AVERAGE  OF  ORDER  2 

The  results  of  the  previous  section  can  be  illustrated  by  consider- 
ng  the  MA(2)  process.  The  region  of  (P]^sp2)  corresponding  to  the 
MA(2)  processes  is  given  in  Section  3.4  of  Box  and  Jenkins  (1976).  It 
is  the  intersection  of  So  of  the  previous  section  and  the  plane 

a(0)  =1  .  The  boundary  is  given  by  P2~  ^i~2  *  P2~”^l"?’  Pl‘*‘^^P2 
=  Y  corresponding  to  root  -Ijlj  and  two  complex  conjugate  roots 
of  absolute  value  1,  respectively.  For  T  =  3,  is  positive  semidefi- 

nite  if  and  only  if  -1  s  p^^  ^  1  and  p^  ^  (1  +  -  For  T  -  4  , 

is  positive  semidefinite  if  and  only  if 

(125)  p^  <  (P2-l)^+^ * 

For  larger  T  ,  explicit  expression  for  the  positive  definiteness  of 
E  seems  difficult  to  obtain.  In  Figure  5  boundaries  corresponding  to 
T=  3,4,5,6,00  are  plotted.  For  T  =  5,6  the  boundaries  are  computed 
numerically.  We  see  that  for  finite  T  the  regions  are  strictly 
larger  than  the  region  arising  from  MA(2)  processes  (T  =  o°)  .  For 
P2  =  0  ,  the  boundary  points  are  the  same  as  for  MA(1)  case,  namely 

=  ±1/2  cos(^)  .  For  P;l  =  0  ,  the  explicit  expressions  of  limits 
can  be  obtained  as  follows.  If  p^  =  0,  then  yj,y2,y5,...  (odd 
indices)  and  indices)  are  uncorrelated  and  each  sub¬ 

series  forms  a  MA(1)  process  with  parameter  P2  .  Hence  is 
positive  semidefinite  if  and  only  if  submatrices  for  even  indices  and 
odd  indices  are  both  positive  semidefinite.  It  follows  that  the  limits 

are  given  by 
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Figure  5 . 
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(126)  p.  =  ±  - -  . 

^  [{T+l)/2]  +1 

where  [(T+l)/2]  equals  T/2  if  T  is  even  and  (T+l)/2  if  T  is 
odd. 

Detailed  analysis  of  full  likelihood  function  of  the  MA(2)  process 
seems  to  be  difficult  to  carry  out. 


8.  THE  AUTOREGRESSIVE  MOVING  AVERAGE  MODEL 


The  ARMA(p,q)  model  is 


(127) 


S  6.  y^._^•  =  S  a.  v._. 
j=0  ^  ^  j=0  ^  ^  ^ 


where  ccq  =  3q  =  1  ,  and  {v^}  is  a  sequence  of  independent,  identically 

2  2 

distributed  random  variables  with  £v^  =  0  and  •  Let 


(128) 


q  q 

u.  =  Z  a.v^.  =  S  Y/t-i 

t  j=o  J  ^  ^  j=0  J  ^ 


when  Y-f  >  j=0>l>***5q>  '^t~'^t^*^v  '  autocovariances  of 

0  ^  J 


the  unobservable  {u^}  process  are 


(129) 


=  0  , 


h  >  q  . 
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Let  3  -  (3j,...,3p)'  ,  and  y  =  (yq 
Alternative  parametrizations  are  (3>y)  (or  equivalently 
and  aj)  and  . 


* ' ’ ’^q^ '  * 

3, ,  a  , 


The  derivatives  of  the  likelihood  function  with  respect  to  the 
components  of  3  and  y  are 


(130) 


3L 

3 

r 

3L 

3L 

’  3ct,  ' 

~u 

where  =  (3a^j/3Y')  .  As  in  the  case  of  MA(q)  ,  if  there  is  a 

solution  of  the  likelihood  equations  for  3  and  y  such  that  the 

corresponding  vector  (3L/33'  ,  3L/3o;)  is  not  O'  ,  then  the  matrix 

on  the  right-hand  side  of  (130)  must  be  singular,  that  is,  iJ  1  =  0 

~u ' 

at  this  vector  y  .  This  solution  is  noninvertible.  The  analysis  of 
the  MA(q)  model  can  be  carried  over  to  the  ARMA(pjq)  model. 

9.  THE  AUTOREGRESSIVE  MOVING-AVERAGE  PROCESS  OF  ORDER  1  AND  1 

The  considerations  of  the  previous  section  can  be  illustrated  by 
ARMA(1_,1)  process.  Let  y^,...,y-j.  be  T  successive  observations 
from  an  ARMA.(1_,1)  process,  that  is,  the  y^'s  satisfy 

(131)  y^  +  3y^_j  =  +  av^_j  =  u^  . 
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Let  E(u^)  =  and  p  =  a/(l+a^)  .  In  terms  of  the  parametrization 
(g,p,a^)  the  autocovariances  of  the  process  are  given  by 


(132) 


a(0)  =  a 

a(l)  =  a 


2  l-2gp 
^  1-B^ 

2  p+s^p-e 


» 


cr(h+l)  =  -6a(h)  ,  h  =  l,2,...  , 
a(-h)  =  a(h)  ,  h  =  l,2,...  . 


To  investigate  the  T-dimensional  covariance  matrix  Zj  ,  it  is  useful 
to  consider  the  following  transformation: 


(133) 


r  y 

^1 

^  1 

• 

^1 

^2 

3 

1 

^2 

^3 

6  1 

^3 

« 

• 

• 

•  • 

•  • 

• 

• 

,“T, 

s 

*6  1 

2 

Using  cov(y^,U2)  =  cov(u^  -  3yg,U2)  =  cov(upU2)  =  pa^^,  var(y^)  = 
a^(l-23p)/(l-3^)  ,  we  obtain 


'' 

^1 

^  1 

^  1  3 

U2 

B 

1 

1 

3 

(134) 

Var 

"3 

= 

. 

3  1 

~T 

1 

• 

•  • 

• 

• 

• 

• 

•  • 

•  « 

1 

•  • 

• 

1 
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=  cr. 


1-23p 


1  P 
P  1 


Hence  Zj.  is  positive  semidefinite  if  and  only  if  the  matrix  on  the 
right-hand  side  of  (134)  is  positive  semidefinite.  Now  define  the 
determinant  of  the  TxT  matrix  > 


(135) 


Then  the  determinant  of  (134)  is  times 


(136) 


1-23P 

2  P 

1-23p 

(l-3^)p 

1-3 

P  1 

1 

P 

1 

• 

•  • 

• 

•  • 

•  • 

•  • 

1-3^ 

•  •  • 

’  1  P 

1  P 

P  1 

1  .  9  9 

P  1 

“  7^  (D.|.  -  23pD^_j +3^p  D-|-_2)  . 
i-p 


From  Lemma  6.7.9  of  Anderson  (1971)  we  have 
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(137) 


/l-4p' 


\  l+/l-4p^ 


T+1 


f  1-AV] 


T+1 


Here  we  are  interested  in  the  case  \p\  »  because  if  IpI  <y 

the  covariance  matrix  is  clearly  positive  definite.  Let  p  >  -^  with¬ 
out  loss  of  generality.  Now 


(138) 


=  P 


~  "  p(cos  0±i  sin  0)=pe^''®, 


where  0  =  tan'  ~Vap^-1  .  Then 


(139) 


i±/i-4p‘ 


=  p*^(cos  ke  ±i  sin  ke) 


Therefore 

(140)  -  23p  +3^p^  D^_2 

=  [p^^^  2  i  sin(T+l)9-  23pp'''2  i  sin  T9+ 3^P^p'^“^  2  i  sin(T-l)9] 

i/4p^-l 

p  T+1  . 

=  . —  [sin(T+l)0  -  23  sin  T9  +3  sin(T-l)9]  . 

/4p^-l 


Hence  the  determinant  is  zero  if  and  only  if 

(141)  3^  sin(T-l)9-23  sin  T9+sin(T+l)0  =  0  . 

Solving  (140)  for  3  we  obtain 
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Equations  (143)  and  (144)  give  the  boundaries  of  the  region  :  posi¬ 
tive  semidefinite  in  (p,B)-plane.  For  3  =  1  we  have  0=0  or  p  = 

For  3  =  0  we  have  MA(1)  process  and  we  obtain  p  =  1/2  cos  .  This 
can  be  verified  by  setting  cos  ^^0=0  .  For  3=-l  we  have 

cos  2  ®  “  -cos  2  ®  or  Y  ^  ~  7  '  Hence  9  ~ y  or  p  =  1/2  cos(tt/T) . 
A  plot  of  the  boundaries  is  given  in  Figure  6. 

For  p<0  the  regions  are  symmetric  about  the  origin.  Again  we 
see  that  for  finite  T  the  region  is  strictly  larger  than  the  region 
corresponding  to  ARMA(1,1)  processes  (T=”)  . 


roji-* 


50 


Appendix  A 


A1 


1+d. 


2  dt  ^°9|R| 


P=l/2 


Since  log|R|  is  continuously  differentiable  in  p  we  can  let  p 
approach  1/2  from  above.  Then  as  in  Section  9 


R|  = 


=  P 


P)  ^  T  sin(T+l)6 
I  ^  sin  6 

T  sin(T9)  cos  e+cos(Te)  sin  9 


sin  9 
T 


p  cos  9  +  p  cos(T9) 

Y  +  p"^  cos(T9)  , 


where  9  =  tan“^[(4p^  -  and  cos9  =  l/2p.  Hence 

cos(Te)  -TpT  sin(T0) 


Now 

d9  _  1  .  8p  _  1  2p  _  1 

l+4p2-l  2(4p^-l)^/^  2pM4p^- 1)^/2  2p^  sin  9 

Letting  p->-l/2  (hence  9->-0)  we  have 


T 


P=l/2 


=  1-1 
2  dp 


P=l/2 


T(T-1)2'^^^  , 


or 
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=  aj_j  -  2  T{T-1)  , 


where 


=  2'^(d/dp)D^ 


P=l/2  * 


Hence 


AT  so 


a,  =-2  I  t(t-l)  =  -(2/3)(T+l)T(T-l)  . 
‘  t=l 


P=l/2 


2"'^(T+1)  . 


Combining  these  we  obtain 

I  "“t  _  T(T-l) 


A2  We  have  to  check  that  for  T  sufficiently  large  the  left-hand  side 
of  (53)  has  at  least  k  terms.  Now  >  0  is  equivalent  to 

.  T-i  ill 

°t  ^  “  T+2  “  .  ,  2 

=  -(1  -^)(1  -y+  ...) 

=  -1  +  Y  +  O(^) 


However, 
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d 


T-j 


=  -1 


+  Tr^(j+1)^  + 

*  '  •  •  •  • 

2(T+1)^ 


Hence  we  see  that  the  left-hand  side  contains  about  /T  terms. 
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